Managing Provenance in Scientific Workflows with ProvManager
نویسندگان
چکیده
Running scientific workflows in distributed environments is motivating the definition of provenance gathering approaches that are loosely coupled to the workflow systems. We have proposed a provenance gathering strategy that is independent from workflow system technology. This strategy has evolved into a provenance management system named ProvManager. The main principle is that each workflow activity should collect its own provenance data and publish them in a repository which scientists can access to make their queries. In this paper we show how provenance is captured along distributed heterogeneous systems. Two main strategies are used to capture provenance: using Prolog predicates to register provenance, and using an API for the communication between the wrapped activity and the ProvManager.
منابع مشابه
Challenges in Managing Implicit and Abstract Provenance Data: Experiences with ProvManager
Running scientific workflows in distributed and heterogeneous environments has been motivating the definition of provenance gathering approaches that are loosely coupled to workflow management systems. We have developed a provenance management system named ProvManager to manage provenance data in distributed and heterogeneous environments independent of a specific Scientific Workflow Management...
متن کاملIntegrating Provenance Data from Distributed Workflow Systems with ProvManager
Running scientific workflows in distributed environments is motivating the definition of provenance gathering approaches that are loosely coupled to the workflow execution engine. This kind of approach is interesting because it allows both storage and access to provenance data in an integrated way, even in an environment where different workflow management systems work together. Therefore, we h...
متن کاملIsolation Levels for Data Sharing in Large-Scale Scientific Workflows
Scientists can benefit from Grid and Cloud infrastructures to face the increasing need to share scientific data and execute data-intensive workflows at a large scale. However, these workflows are creating more and more challenging problems in the automation of data management during execution. Existing workflow management systems focus on how data is stored, transfered and on data provenance. H...
متن کاملProject Histories: Managing Data Provenance Across Collection-Oriented Scientific Workflow Runs
While a number of scientific workflow systems support data provenance, they primarily focus on collecting and querying provenance for single workflow runs. Scientific research projects, however, typically involve (1) many interrelated workflows (where data from one or more workflow runs are selected and used as input to subsequent runs) and (2) tasks between workflow runs that cannot be fully a...
متن کاملManaging Rapidly-Evolving Scientific Workflows
We give an overview of VisTrails, a system that provides an infrastructure for systematically capturing detailed provenance and streamlining the data exploration process. A key feature that sets VisTrails apart from previous visualization and scientific workflow systems is a novel action-based mechanism that uniformly captures provenance for data products and workflows used to generate these pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010